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Abstract 

Efficient distributed spectrum sharing mechanism is crucial for improving the spectrum utilization. The spatial 
'^j". aspect of spectrum sharing, however, is less understood than many other aspects. In this paper, we generalize a 

^->J ■ recently proposed spatial congestion game framework to design efficient distributed spectrum access mechanisms 

with spatial reuse. We first propose a spatial channel selection game to model the distributed channel selection 
D ■ problem with fixed user locations. We show that the game is a potential game, and develop a distributed learning 
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mechanism that converges to a Nash equilibrium only based on users' local observations. We then formulate the 
joint channel and location selection problem as a spatial channel selection and mobility game, and show that it 
is also a potential game. We next propose a distributed strategic mobility algorithm, jointly with the distributed 
learning mechanism, that can converge to a Nash equilibrium. Numerical results show that the Nash equilibria 
achieved by the proposed algorithms have only less than 8% performance loss, compared with the centralized 
optimal solutions. 

I. Introduction 



in 

Dynamic spectrum sharing is envisioned as a promising technique to alleviate the problem of spectrum 
under-utilization JT). It enables unlicensed wireless users (secondary users) to opportunistically access the 
licensed channels owned by legacy spectrum holders (primary users), and thus can significantly improve 

X- 

$h \ the spectrum efficiency (2J. 



A key challenge of dynamic spectrum sharing is how to resolve the resource competition by selfish 
secondary users in a decentralized fashion. If multiple secondary users transmit over the same channel 
simultaneously, it may lead to severe interference and reduced data rates for all users. Therefore, it is 
necessary to design efficient distributed spectrum sharing mechanism. 

The competitions among secondary users for common spectrum resources have often been studied using 
noncooperative game theory (e.g., j3]|-[|3). Nie and Comaniciu in 01 designed a self-enforcing distributed 
spectrum access mechanism based on potential games. Niyato and Hossain in |5| studied a price-based 
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Fig. 1. Illustration of distributed spectrum access with spatial reuse 

spectrum access mechanism for competitive secondary users. Flegyhzi et at in proposed a two-tier 
game framework for medium access control (MAC) mechanism design. Law et al. in [0 studied the 
system performance degradation due to users' selfish behaviors in spectrum access games. 

When not knowing spectrum information such as channel availabilities, secondary users need to learn the 
environment and adapt the spectrum access decisions accordingly. Han et al. in [8] and Maskery et al. in 
||9l used no-regret learning to solve this problem, assuming that the users' channel selections are common 
information. The learning converges to a correlated equilibrium |[T0ll , wherein the commonly observed 
history serves as a signal to coordinate all users' channel selections. When users' channel selections 
are not observable, authors in |[TT | - lfT3l designed multi-agent multi-armed bandit learning algorithms to 
minimize the expected performance loss of distributed spectrum access. 

A common assumption of the above results is that secondary users are close-by and interfere with each 
other when they transmit on the same channel simultaneously. However, a critical feature of spectrum 
sharing in wireless communication is spatial reuse. If wireless users are located sufficiently far apart, then 
they can transmit in the same frequency band simultaneously without causing any performance degradation 
(see Figure \T\ for an illustration). Such spatial effect on distributed spectrum sharing is less understood 
than many other aspects in existing literature lfl4ll . which motivates this study. 

Recently, Tekin et al. in lfT5l and Southwell et al. in |[T6ll proposed a novel spatial congestion game 
framework to take spatial relationship into account. The key idea is to extend the classical congestion 
game upon a general undirected graph, by assuming that a player's payoff depends on the number of its 
neighbors that choose the same resource (i.e., users are homogeneous in terms of channel contention). The 
homogeneous assumption follows from the set up of the classical congestion game (which only works on 
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a fully connected graph). The application of such a homogeneous model, however, is quite restricted, since 
users typically have heterogenous channel contention probabilities in wireless systems. For example, users 
of heterogeneous wireless channel conditions may have heterogeneous packet transmission error rates, 
which in turn result in heterogeneous channel contention window sizes at the equilibrium according to the 
distributed coordination function (DCF) of IEEE 802.11 networks ifTTll . This implies that users would have 
heterogeneous channel contention probabilities if they have heterogeneous equilibrium contention window 
sizes. As another example, users running heterogeneous applications would have heterogeneous channel 
access priorities according to the enhanced distributed channel access (EDCA) mechanism of IEEE 802.1 le 
networks |fT8ll . In this paper, we extend the spatial congestion game framework to formulate the random 
access based distributed spectrum sharing problem with spatial reuse, by taking users' heterogeneous 
channel contention probabilities into account. Such extension is highly non-trivial, and significantly 
expands possible applications of the model. Moreover, we propose distributed algorithms to achieve Nash 
equilibria of the generalized spatial games. 

We consider two game models in this paper. In the first model, secondary users have fixed spectrum 
access locations, and each user selects a channel to maximize its own utility in a distributed manner. 
We model the problem as a spatial channel selection game. In the second more general model, users are 
mobile, and they are capable to select channels and spectrum access locations simultaneously in order to 
better exploit the gain of spatial reuse. We formulate the problem as a joint spatial channel selection and 
mobility game. The main results and contributions of this paper are as follows: 

• General game formulation: We formulate the spatial channel selection problem and the joint chan- 
nel and location selection problem as noncooperative games on general interference graphs, with 
heterogeneous channel available data rates depending on user and location. 

• Existence of Nash equilibrium and finite improvement property: For both the spatial channel selection 
game and the joint spatial channel selection and mobility game, we show that they are potential 
games, and hence they always have at least one Nash equilibrium and possess the finite improvement 
property. 

• Distributed algorithms for achieving Nash equilibrium: For the spatial channel selection game, we 
propose a distributed learning algorithm, which globally converges to a Nash equilibrium by only 
utilizing users' local observations. For the spatial channel selection and mobility game, we propose 
a distributed strategic mobility algorithm, which also converges to a Nash equilibrium, when jointly 
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used with the distributed learning algorithm. 
• Superior performance: Numerical results show that the Nash equilibria achieved by the proposed 
algorithms have only less than 8% performance loss, compared with the centralized optimal solutions. 
The rest of the paper is organized as follows. We introduce the system model and the spatial channel 
selection game in Sections [TT] and HHl respectively. We present the distributed learning mechanism for 
spatial channel selection in Section [IV] Then we introduce the joint spatial channel selection and mobility 
game in Section |Vl and study the uniqueness and efficiency of Nash equilibrium in Section |VIJ We 
illustrate the performance of the proposed mechanisms through numerical results in Section IVIIl and 
finally conclude in Section IVIIIl 



We consider a dynamic spectrum sharing network with a set M. = {1, 2, M } of independent 
and stochastically heterogeneous primary channels. A set J\f = {1,2, N} of secondary users try to 
access these channels in a distributed manner when the channels are not occupied by primary (licensed) 
transmissions. 

To take the spatial relationship into account, we assume that the secondary users are located in a spatial 
domain A, i.e., a finite set of possible spectrum access locations. We denote d n G A as the location 
of user n, and d = (di, ..,o?jv) G II = A N as location profile of all users. Each secondary user has a 
transmission range 5. Then given the location profile d of all users, we can obtain the interference 
graph G d = {N ' , S d } to describe the interference relationship among users (see Figured] for an example). 
Here vertex set Af is the secondary user set, and edge set £a — '■ \\di,dj\\ < 5, Vi, j ^ i E ftf} is 

the set of interference edges (with \ \di, dj\ \ being the distance between locations di and dj). If there is an 
interference edge between two secondary users, then they cannot successfully transmit their data on the 
same idle channel simultaneously due to collision. In the sequel, we also denote the set of interfering users 
with user n (i.e., user n's "neighbors") under the location profile d as M n {d) = {i : [n, i) G £d, i £ A/"}. 

We consider a time-slotted system model as follows: 

• Channel state: for each primary channel m, the channel state at time slot t is 



II. System Model 
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Fig. 2. Two states Markovian channel model 



• Channel state changing: the state of a channel changes according to a two-state Markovian process 
D3D> ll20ll (see Figure [2]). We denote the channel state probability vector of channel m at time t 

as q m (t) = (Pr{S m (t) = 0}, Pr{S m (t) = 1}), which forms a Markov chain as q m (t) = q m (t — 
l)r m ,Vt > 1, with the transition matrix 

Furthermore, the long run statistical channel availability 6 m G (0, 1) of a channel m can be obtained 
from the stationary distribution of the Markov chain, i.e., 



0,, 



"m 1 Sm 



(i) 



User-and-location specific channel throughput: for each secondary user n at location d, its realized 
data rate 5™ d (t) on an idle channel m in each time slot t evolves according to an i.i.d. random process 
with a mean B 1 ^ d , due to users' heterogeneous transmission technologies and the local environmental 
effects such as fading ETI . For example, we can compute the data rate according to the 

Shannon capacity as 



b n m = B m \o g2 1 + 



CO 



(2) 



m,d 



where B rn is the bandwidth of channel m, Q n is the fixed transmission power adopted by user n 
according to the requirements such as the primary user protection, w,™ d denotes the background 
noise power, and g" d (i) is the channel gain. In a Rayleigh fading channel environment, the channel 
gain g^Jyt) is a realization of a random variable that follows the exponential distribution ETI . 
• Time slot structure: each secondary user n executes the following stages synchronously during each 
time slot: 
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- Channel sensing: sense one of the channels based on the channel selection decision generated 

k 



at the end of previous time slo^ 



lis 



- Channel contention: we use persistence-probability-based random access mechanisrro i.e., user n 
contends for an idle channel with probability p n E q = (fW, Pmax), where < p min < p max < 1 
denote the minimum and maximum contention probabilities. If multiple users contend for the 
same channel, a collision occurs and no user can transmit. Since each user (i.e., a wireless 
device) typically has limited battery, to achieve a longer expected lifetime, we limit user's channel 
contention in a time slot as 

CnPn < Vn, (3) 

where v n denotes the energy constraint of user n. 

- Data transmission: transmit data packets if the user is the only one contending for an idle channel 
(i.e., no collision is detected). 

- Channel selection: choose a channel to access next time slot according to the distributed learning 
mechanism (introduced in Section |TVT) . 

Let a n E M. be the channel selected by user n, a = (01, ajv) E A = A4 N be the channel selection 
profile of all users, and p = (pi, ...,pn) be the channel contention probability profile of all users. We can 
then obtain the long run expected throughput of each user n choosing channel a n in location d n as 

Q n (d,a,p) = 9 an B: n4nPn J] (1- Pi ), (4) 

ieAf£ n (d,a) 

where J\f^(d,a) = {i : di = a n and i E J\f n (d)} is the set of interfering users that choose the same 
channel as user n. To take the fairness issue into account, we consider the proportional-fair utility ll23l 
function in this study, i.e., 

U n (d,a,p) = log Q n (d, a, p). (5) 

'This paper focuses on studying the spatial aspect on distributed spectrum sharing, by assuming that users are capable of perfect spectrum 
sensing. If a user has imperfect spectrum sensing, it would lead to a lower spectrum utilization for the user. For example, false-alarm 
mistakenly reports an idle channel as busy and hence results in a waste of spectrum opportunities. Missed detection mistakenly reports a 
busy channel as idle and results in a transmission collision with primary users. In this case, we can add a value say \ n into the throughput 
function in I0, which describes the performance of user's spectrum sensing. If X n = 1, the user has the perfect spectrum sharing. If \ n < 1, 
the user has the imperfect spectrum sensing. However, since the variable X„ does not depend on other secondary users' activities, the analysis 
in this paper is still valid. 

2 This model can also provide useful insights for the case that the contention-window-based random access mechanism is implemented, 
since the persistence probability p n is related to the contention window size w n according to p n = w 2 +1 11221 , 
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Other type of utility functions such as general alpha-fairness will be considered in a future work. 

Equation © shows that user n's utility U n (d,a,p) is an increasing function of its contention prob- 
ability p n . This implies that, when a user is aggressive and does not care about the collisions, it can 
adopt the maximum possible channel contention probability p n satisfying the energy constraint ©, i.e., 
p n = min{p max , When users take the cost of collisions into account, we can adopt the game theoretic 
framework for the contention control in ll24l . Furthermore, a dynamic contention control scheme is 
proposed in It24ll that converges to a stable channel contention probability profile such that no users 
can further improve unilaterally. In this paper, we hence assume that the channel contention probability 
p n of each user is fixed and focus on the issues of distributed location and channel selections. For the sake 
of brevity, we also denote the utility of each user n as U n (d, a), where the decision variables are location 
selections d and channel selections a only. Since our analysis is from the secondary users' perspective, 
we will use the terms "secondary user" and "user" interchangeably. 

III. Spatial Channel Selection 

We first consider the case that all users' locations d are fixed, and each user tries to maximize its own 
utility by choosing a proper channel in a distributed manner. Given other users' channel selections a_ n , 
the problem faced by a user n is 

max U n (d, a n , a_ n ),Wn £ J\f. (6) 

a„eM 

The distributed nature of the spatial channel selection problem naturally leads to a formulation based on 
game theory, such that each user can self organize into a mutually acceptable channel selection (Nash 
equilibrium) a* = (a*, a* 2 , a* N ) with 

a* = arg max U n (d, a n , o* n ), Vn £ N . (7) 

a„£M 

We next formulate the spatial channel selection problem as a game, and further show the existence of 
Nash equilibrium. 

A. Spatial Congestion Game 

We first review the spatial congestion game introduced in [fT5l . Spatial congestion games are a class of 
strategic games represented by T = (Af,A4, {A/" n (d)} n£ AA, {U n }nejv)- Specifically, Af is the set of players, 
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A4 is the set of resources, and Af n (d) is the set of players that can cause congestion to player n when 
they use the same resource. The payoff of player n for using resource a n G Ai is U n (a) = /" (C™ (a)), 
where C™ n (a) = YlieAf n (d) ^K=a n } denotes the number of players in the set Af n (d) that choose the same 
resource a n as user n, and /" (•) denotes some user-specific payoff function. Typically, C" (o) is also 
called the congestion level. 

Note that the classical congestion games can be viewed as a special case of the spatial congestion 
games by setting the interference graph Ga as a complete graph, i.e., Af n (d) = Af\{n}. For the classical 
congestion game, it is shown in [25] that it is an (exact) potential game, which is defined as 

Definition 1 (Potential Game 112510 . A game is called a weighted potential game if it admits a potential 
function $(a) such that for every n G Af and a_ n G A4 N ~ 1 , 

$(a' n , a_ n ) - $(a n , a_ n ) = w n (u n (a' n , a~n) ~ U n (a n , a_ n ) J , 

where w n > is some positive constant. Specifically, if w n = 1 , Wn G Af, then the game is also called an 
exact potential game. 

Definition 2 (Better Response Update [251). The event where a player n changes to an action a' n from 
the action a n is a better response update if and only if U n (a' n , a_ n ) > U n (a n , a_ n ). 

Definition 3 (Finite Improvement Property Il25l0 . A game has the finite improvement property if any 
asynchronous better response update process (i.e., no more than one player updates the strategy at any 
given time) terminates at a pure Nash equilibrium within a finite number of updates. 

An appealing property of the potential game is that it admits the finite improvement property, which 
guarantees the existence of a Nash equilibrium. When a general payoff function /" (■) is considered, 
however, the spatial congestion game does not necessarily possess such a nice property |[T5l . We next 
extend the spatial congestion game framework for the random access mechanism in Section HH and show 
that the spatial channel selection problem in © with the payoff function given in © is a potential game. 

B. Generalized Spatial Congestion Game Formulation 

As mentioned, the spatial congestion game proposed in |[T5l assumes that a player's utility depends on 
the number of players in its neighbors that choose the same resource. For our case, however, a user's 
utility in © depends on who (instead of how many users) in its neighbors contend for the same channel, 
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since users have heterogenous channel contention probabilities. We hence generalize the spatial congestion 
game framework for the random access mechanism in Section [II] by extending the definition of congestion 
level C™ (a). According to © and ©, we have 

U n (d, a) = log (6 an B n an4nPr ) + - P*)' 

We then extend the definition of (a) in the standard spatial congestion game by setting (a) = 
J2ieMn n {da) l°g(l — Pi)- Here C" n (a) is regarded as the generalized congestion level perceived by user n 
on channel a n . When all users have the same channel contention probability Pi = p, we have C^ n (a) = 
log(l —p) YliieMn n {d a) I{a-i=a n }' which degrades to the standard case. Then the user specific payoff function 
is /a n (^a n ( a )) = l°g (9a, n B™ n d n Pn) + C™ n (a) ■ I n me following, we refer to this game formulation as the 
spatial channel selection game. We show that 

Lemma 1. The spatial channel selection game on a general interference graph Gd is a weighted potential 
game, with the potential function as 

S(d, a) = ~ log(l -Pi)\\ E h ^ ~ P^ + l0 § (*-X*P0 ) ' 

and the weight Wi = — log(l — Pi). 

The proof is given in Appendix |Al It follows from Lemma Q] that 

Theorem 1. The spatial channel selection game on a general interference graph has a Nash equilibrium 
and the finite improvement property. 

By the finite improvement property, any asynchronous better response update leads to a Nash equilib- 
rium. However, the better response update requires each user to know the strategies of other users, and 
then takes a better strategy to improve its payoff. This requires extensive information exchange among 
the users. The signaling overhead and energy consumption can be quite significant and even infeasible in 
some network scenarios. We next propose a distributed learning mechanism, which utilizes user's local 
observations only and converges to a Nash equilibrium. 

IV. Distributed Learning Mechanism For Spatial Channel Selection 

In this part, we introduce the distributed learning mechanism for spatial channel selection, and then 
show that it converges to a Nash equilibrium. 
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A. Distributed Learning Mechanism 

Without information exchange, each user can only estimate the environment through local measurement. 
To achieve accurate estimation, a user needs to gather a large number of observation samples. This 
motivates us to divide the learning time into a sequence of decision periods indexed by T(= 1,2, ...), 
where each decision period consists of K time slots (see Figure [3]). During a single decision period, a 
user accesses the same channel in all K time slots. Thus the total number of users accessing each channel 
does not change within a decision period, which allows users to better learn the environment. 

The key idea of distributed learning here is to adapt a user's spectrum access decision based on its 
accumulated experiences. At the beginning of each period T, a user n chooses a channel a n (T) £ A4 to 
access according to its mixed strategy cr n (T) = (a™ (T),Vm £ M), where <r™(T) is the probability of 
choosing channel m. The mixed strategy is generated according to Z n (T) = (Z^(T),Vm £ Ai), which 
represents its perceptions of choosing different channels based on local estimations. We map from the 
perceptions Z n (T) to the mixed strategy <r n (T) in the proportional way, i.e., 

Z n (T) 

<(T)= M m ,Vm6M. (9) 

At the end of a decision period T, a user n computes its estimated expected payoff U n (T) based on 
the sample average estimation over K time slots in the period, i.e., U n (T) = K n( ' ,; where U n (T,t) 
is the payoff received by user n in time slot t. Then user n adjusts its perceptions as 

Z n (T) 

Z m( T + 1) = ' + » T U n (T)I {an{T)=m} , Vm £ M, (10) 

where ji T is the smoothing factor and /{ an ( T ) =m } is an indicator whether user n chooses channel m 
at period T. The user first normalizes the perception values (the first term on RHS of (flOl) ) and then 
reinforces the perception of the channel just accessed (the second term on RHS of (fTOl)). The purpose of 
normalization here is to bound the perception values. We summarize the distributed learning mechanism 
for spatial channel selection in Algorithm \T\ 

We then analyze the computational complexity of the distributed learning algorithm. For each iteration 
of each user, Line 5 involves M division operations in ©. This step has the complexity of O(M). 
Similarly, Line 11 has the complexity of O(M). Lines 6 to 9 involves K channel contention in K time 
slots and hence have the complexity of O(K). Line 10 involves K summation operations, which also 
has the complexity of O(K). Suppose that it takes C iterations for the algorithm to converge. Then total 
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Fig. 3. Time structure of distributed learning 



Algorithm 1 Distributed Learning For Spatial Channel Selection 

i: initialization: 

2: set the initial perception value Z n (l) = (jj, jj). 

3: end initialization 

4: loop for each decision period T and each user n in parallel: 

5: select a channel a n (T) E M according to the mixed strategy er n (T) by ©. 

6: for each time slot t in the period T do 

7: sense and contend to access the channel a n (T). 

8: record the realized utility U n (T,t) 

9: end for K 
10: calculate the average utility U n {T) = S^i^M . 
ii: update the perception values Z n (T) according to (flOl) . 

12: end loop 



computational complexity of the distributed learning algorithm of N users is OiCNK + CNM). 

B. Dynamics of Distributed Learning 

We then study the dynamics of distributed learning mechanism, which provide useful insights for the 
convergence of the learning mechanism. 

First of all, it is easy to show that learning procedures in © and (flOl) correspond to the following 
discrete time dynamics. 

Lemma 2. For the distributed learning mechanism for spatial channel selection, the discrete time dynamics 
are given as 

n fm , i\ n m , t^U^T) (I{a n (T)=m} ~ °m CO) . 

°m(T+l) =cT m (T) + — — — ,VmeM,neA/. (11) 

1 + n T U n {T) 

Since the updated perception value Z^(T) depends on the estimated payoff U n (T), Z^(T) is thus a 
random variable. The equations in (fm) are hence stochastic difference equations, which are difficult to 
analyze directly. Based on the stochastic approximation theory 11261 . we then focus on the analysis of its 
mean dynamics, which has the same convergence equilibrium as the discrete dynamics (fTT|) . 
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To proceed, we define the mapping from the mixed strategies tr(T) to the expected payoff of user n 
choosing channel m as V^O(T)) = E[U n (T)\<r(T),a n (T) = m]. Here the expectation E[-} is taken with 
respect to the mixed strategy profile er(T) of all users. We show that 

Lemma 3. For the distributed learning mechanism for spatial channel selection, when smoothing factor 
[it satisfies J2 T = oo and J2t V^t < °°> then as T goes to infinity, the sequence {<x(T), VT > 0} 
converges to the limiting point of the differential equations 

= <(T) [v-[a{T)) -f2^(T)Vncr(T))j ,VmeM,ne Af . (12) 

The proof is given in Appendix |B] The mean dynamics in (fT2~l) imply that for a user if a channel 
offers a better payoff than his current average payoff, then the user will choose that channel with a higher 
probability in future learning. 

C. Convergence of Distributed Learning 

We now study the convergence of the mean dynamics in (fT2~l) . To proceed, we first define the following 
functions 

L(*(T)) ± E[$(d,a)\<r(T)], (13) 

and 

L»(<t(T)) 4 E[$(d, a)\cr(T), a n (T) = i). (14) 

Here L(tr(T)) is the expected value of the potential function $ given the mixed strategy profile <r(T), 
and Lf(cr(T)) is the expected value of $ given that user n chooses channel n and other users adhere to 
the mixed strategy profile cr(T). We show that 

Lemma 4. L«(tr(T)) - L?(<r(T)) = -log(l -p n ) (V"(<t(T)) - V?(tr(T))) ,Vi,jeM,nE M. 

The proof is given in Appendix O This lemma implies that the potential function of the spatial channel 
selection game in ([8]) also holds in the expectation sense. Based on Lemma HI we show that 

Theorem 2. When smoothing factor satisfies fir = 00 and YIt < °°> the distributed learning 
mechanism for spatial channel selection asymptotically converges to a Nash equilibrium. 

The proof is given in Appendix ID] The key idea is to show that the time derivative of L(tr(T)) is 
non-decreasing, i.e., dL ^ T ^ > 0. Since L(cr(T)) is bounded above, the learning dynamics must converge 
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to an invariant set such that dL ^ T ^ = 0, which corresponds to the set of Nash equilibria. 

V. Joint Spatial Channel Selection and Mobility 

Future mobile devices are envisioned to incorporate the intelligent functionality and will be capable of 
flexible spectrum access [|27l . Most existing efforts (e.g., ir3l- [fT3l ), however, focus on spectrum sharing 
networks with stationary secondary users. How to better utilize the gain of spatial reuse in mobile cognitive 
radio networks is less understood. Due to the heterogeneous geo-locations of primary users, the spectrum 
availabilities can be very different over the spatial dimension. A secondary user can achieve higher 
throughput if it moves to a location with higher spectrum opportunities and fewer contending users. 
This motivates us to consider the throughput-driven mobility case that each user has the flexibility to 
change both its spectrum access location and channel. 

We note that the idea of strategic mobility is not necessarily applicable to all communication scenarios. 
For example, in vehicular ad-hoc networks, user's mobility is typically generated by user's driving plan, 
thus the idea of strategic mobility for better network throughput may not apply. However, there are 
some networking scenarios where strategic mobility can be very useful. For example, in areas of poor 
connectivity, cellular phone users often try to find a location with better connectivity by moving around 
and observing the signal strength bars. As another example, in many large academic conferences, a user 
often experiences poor Wi-Fi connections in a conference room with a lot of attendees. The connection 
gets much better when the user moves into the conference lobby just tens of meters away with much fewer 
users. To summarize, a user has the incentive to move if he has to complete an urgent communication 
task and the movement is within a reasonable distance. 

Strategic mobility has also been discussed in several related literature. Satyanarayanan in [28] has 
proposed the strategic mobility for better network service as an important function of pervasive computing. 
An envisioned scenario is that a software agent can intelligently gather information from both the network 
and user and provide appropriate suggestions about location changing to the user so that the user can 
achieve a better communication performance. Inspired by this, Balachandran et al. in ||29| proposed a 
network-directed roaming approach to relieve congestion in public area wireless access point networks. 
When an access point (i.e., a location) is over-loaded, the feedback about where to move to get less-loaded 
access points will be provided to users. However, this approach computes the feedback in a centralized 
manner from the perspective of the network, and does not take the selfish nature of users into account. 
For example, it is possible that most users would choose to move to the same closest access point, which 
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would also cause serve congestion at the new access point. We note that the strategic mobility game model 
in our paper works in a distributed fashion from the perspective of each individual user. For example, 
each user can first inform its software agent about the set of preferable candidate locations. Then all the 
software agents can apply the proposed algorithm to identify a mutually acceptable location selection 
profile for all the users (i.e., Nash equilibrium of the game). 

Our proposed algorithm is also relevant to the vision of having networks of mobile agents (e.g., robots) 
autonomously performing sensing and communication tasks 11301 . One critical issue of these networks is 
how to utilize the strategic mobility to improve communication performance [|30l . For example, wireless 
mobile camera sensors with the purpose of reporting a static target to a data sink can improve their 
reporting data rates (i.e., achieving a higher video streaming quality) by moving strategically among the 
feasible locations subject to the geographical constraints of the reporting tasks. The strategic mobility 
game solution in this paper, which requires no information exchange among the sensors for negotiating 
the location selections, can be very useful for designing a self-organizing system for such a scenario. 

A. Strategic Mobility Game with Fixed Channel Selection 

We first study the case that the channel selection profile of all users is fixed, and users try to choose 
proper spectrum access locations to maximize their own payoffs in a distributed manner. Without loss 
of generality, we assume that the locations on the spatial domain A are connected^, i.e., it is possible 
to get to any other locations from any location. We further introduce the user specific location selection 
space A„ C A to characterize user heterogeneity in mobility preference. For example, A„ C A can be 
the set of preferable candidate locations input by user n to its software agent in the context of pervasive 
computing. If a user n is willing to move all possible locations, then we have A„ = A. If the user does 
not want to move, we have A n = {d n } where d n is user n's fixed location. As another example, A n is the 
set of feasible locations to move subject to the geographical constraint of sensor n's sensing tasks in the 
context of mobile sensor networks. We then introduce the strategic mobility game f2 = (A/", d, {U n }n£jv)i 
where J\f is the set of users, d = (d±, ...d n ) G = A x x ... x Ajy is the location profile of all users, and 
U n (d, a) is the payoff of user n given the fixed channel selection profile a of all users. A location profile 
d* = (d* n , d*_ n ) is a Nash equilibrium under a fixed a if and only if it satisfies that 

d* n = arg max U n (d n , d*_ n , a), Vrt G Af. (15) 

3 For the case that the spatial domain is not connected, it can be partitioned into multiple connected sub-domains. 
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We show that 

Lemma 5. The strategic mobility game Q is a weighted potential game, with the same potential function 
as $(cZ, a) in and the weight w n = — log(l — p n ). 

The proof is given in Appendix [0 According to the property of the potential game, it follows that 

Theorem 3. The strategic mobility game Q has a Nash equilibrium and the finite improvement property. 

Similarly to the spatial channel selection game, we can apply the distributed learning mechanism to 
achieve the Nash equilibrium. However, due to the cost of long distance traveling, it is often the case 
that each user only desires to move to a new location that is close enough to its current location in each 
single location update decision. For example, subject to the topological constraint, mobile sensors may can 
only move to a neighboring location in each single location update. Thus, we next propose a distributed 
strategic mobility algorithm that takes this local learning constraint into consideration. 

B. Distributed Strategic Mobility Algorithm 

We assume that each user has a traveling distance constraint § n , i.e., user n at location d n can only 
move to a new location in the restricted set of locations A^ n = {d E A n \{d n } : \ \d, d n \\ < When a 
user has a large enough traveling distance constraint $ n (e.g., d n > max d6 A„\{d„}{||d, d n \ |}), we will have 
= A n \{d n } and the user would like to explore all other locations in each location update decision. 
When the traveling distance constraint is very small (e.g., # n = 0), then we have A 7 ^ = and the user 
n does not want to change its location and will not involve the location selection procedure. Furthermore, 
we assume that each user n only has the information of its utility U n (d, a) through local measurement 

Motivated by the CSMA mechanism in ||3T1 and distributed P2P streaming algorithm in 11321 , we design 
an efficient distributed strategic mobility algorithm by carefully coordinating users' asynchronous location 
updates to form a Markov chain (with the system state as the location profile d of all users). The details of 
the algorithm are given in Algorithm |2l Here users update their locations asynchronously according to a 
timer value that follows the exponential distribution^ with a rate of r n \ A^|, where the density r„ describes 

4 Users can adopt the similar sample average estimation approach as in distributed learning mechanism in Section HV- Al 
5 For ease of exposition, we have considered a Markov chain with the count-down process following an exponential distribution. It is 
shown in 1311 , 1331 , 1341 that the convergent stationary distribution is the same as long as the state transition process (i.e., the location update 
process in our case) follows a general probability distribution with the same mean as in the exponential distribution case. This implies that 
the proposed mobility algorithm can be implemented in a more practical way. For example, a user can update its location with a waiting 
time based on the power law distribution, which is a common statistical property of many human activities 1351 . Since we allow user specific 
location update density r n in the algorithm, this further implies that the waiting time for location update can be generated by user's demand 
and activities (e.g., dialing a phone call and writing an email at a location), rather than by the artificial count-down process. 
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Algorithm 2 Distributed Strategic Mobility Algorithm 
i: initialization: 

2: set the temperature 7 and the location update density r, 
3: end initialization 



4: loop for each user n in parallel: 

5: generate a timer value following the exponential distribution with the mean equal to - , , 
where d n is the current location of the user and | is the number of feasible locations to move to 
next. 

6: count down until the timer expires. 

7: if the timer expires then 

8: record the payoff U n (d, a). 

9: choose a new location d' n randomly from the set AJJ. 

10: move to the new location d' n and record the payoff U n (d , a). 

ii: stay in the new location d' with probability e '° s(1 Pn) ~' Un(d ' a) — or move back 

J e -log(l-p n )7^nCd,o) +e -log(l-y n )7C7 n (d ,0) 

to the original location d n with probability 
12: end if 
13: end loop 



- log(l — Pn)lUn(d,o)^ e - log(l — Pn )lf„ (d' ,a) 



how often a user n updates its location. Users with higher QoS requirement may update its location more 
often (i.e., with a larger timer density), in order to achieve a higher data rate. Since the exponential 
distribution has support over (0, 00) and its probability density function is continuous, the probability 
that more than one users generate the same timer value and update their locations simultaneously equals 
zero. Furthermore, if a user n does not want to move, we have lA*, 1 I = and hence the user n will not 
update its location according to the algorithm. If a user has a set of candidate locations A^ n to move, it 
will have chances to update its location selection and hence improve its utility, which also improves the 
system potential $(eZ, a) by the property of potential game. In the algorithm, we will use a temperature 
parameter 7 to control the randomness of users' location selections. As 7 increases, a user will choose a 
location of higher utility with a larger probability. As an example, the system state transition diagram of 
the distributed strategic mobility Markov chain by two users is shown in Figure HI 

Since user n will randomly choose a new location d' n E A^ and stays there with probability 



e log(1 P")T C7 "( d ,a) — thgn probability from state d = (d n ,d- n ) to d = (d'd_ n ) is 

given as -rAri 1 — • Since each user n revises its location according to the 

countdown timer mechanism with a rate of r n \A2 \, hence if d' n E A^ , the transition rate from state d 



17 



Two-users distributed strategic mobility Markov chain 



Location Map 

Location 1 Location 2 



Location 4 



Location 3 




Fig. 4. System state transition diagram of the distributed strategic mobility Markov chain by two users. In the location map on the left 
hand-side, one location is reachable directly from another location if these two locations are connected by an edge. In the transition diagram 
of the Markov chain on the right hand-side, {d\,d.2) denotes the system state with d\ and di being locations of user 1 and 2, respectively. 
The transition between two system states is feasible if they are connected by a link. 



to state d! is given as 



Qd,d' 



e -~k>g(l-p n )yU„(d ,a) 



(16) 



' g— \og(l-p n )-yU n {d,a) _|_ g- \og(l-p n )~fU n {d ,a) 

Otherwise, we have q dd r = 0. We show in Lemma [6] that the distributed strategic mobility Markov chain 
is time reversible. Time reversibility means that when tracing the Markov chain backwards, the stochastic 
behavior of the reverse Markov chain remains the same. A nice property of a time reversible Markov 
chain is that it always admits a unique stationary distribution, which guarantees the convergence of the 
distributed strategic mobility algorithm. 

Lemma 6. The distributed strategic mobility algorithm induces a time-reversible Markov chain with the 
unique stationary distribution 

e 7*(d,a) 



Pr(d, a) 



-,VdG 9, 



(17) 



Edee^ id " 

where Pr(d, a) is the probability that the location profile d is chosen by all users under the fixed channel 
selection strategy profile a. 



The proof is given in Appendix IB The key of the proof is to verify that the distribution in (fT7l) satisfies 
the detailed balance equations of the distributed strategic mobility Markov chain, i.e., Pr(d, a)q dd > = 
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Pr(d , a)q d > ,. Let $*(a) = max dee $(d> a ) be the maximum of the potential function of the game, and 
$(a) be the expected performance by the distributed strategic mobility algorithm. We have 

Theorem 4. For the distributed strategic mobility algorithm, as the temperature 7 — > 00, f/ze expected 
performance $(a) approaches to $*(a), an J ?/ze distributed strategic mobility algorithm converges to a 
Nash equilibrium. 

Proof: Let be the probability that the location profile d is chosen. It is well known that the 
distribution Pr(d, a) in (fTTT ) is the optimal solution for the following maximization problem ||36l : 

max E de e P dHd,a)-±j: dee P d logP d (18) 
subject to Edee p d = 1 - 

Thus, as 7 — >• 00, the problem (fT8~l) becomes the following problem 

max £ de e p d $ ( d >°) (!9) 
subject to Edee p rf = 1 - 

Let be the optimal solution to problem (fl~9l) . We thus know that, as 7 — > 00, the stationary distribution 
Pr(d, a) approaches to P^. This implies that, as 7 — > 00, $(a.) = Edee P r (d, a) approaches to 

$*(a) = Z d ee P Md,a). ■ 

Note that in practice we can only implement a finite value of the temperature 7. The value of the 
temperature 7 is bounded such that the potential e 7 *( d,a ' does not exceed the range of the largest predefined 
real number on a personal computer. Numerical results show that the algorithm with a large enough 
feasible 7 can converge to a near-optimal solution such that ^(a) is close to $*(a). We then consider the 
computational complexity of the algorithm. For each iteration of each user, Lines 4 to 15 only involve 
random value generation and subduction operation for count-down, and hence have a complexity of 0{\). 
Suppose that it takes C iterations for the algorithm to converge. Then total computational complexity of 
N users is 0{CN). 

C. Joint Channel Selection and Strategic Mobility 

We now consider the case that each user has the flexibility to choose its location and channel simul- 
taneously. Similarly to Section IV-A[ we formulate the problem as a joint spatial channel selection and 
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mobility game T = (JV, (d, a), {U n } n ^). A location and channel profile (<T, a*) is a Nash equilibrium 
if and only if it satisfies that 

(d* n , a*) = arg max U n (d n , d*_ n , a n , a*_ n ),Wn e Af. (20) 
ga 

We show that the game T is also a weighted potential game. 

Lemma 7. The joint spatial channel selection and mobility game T is a weighted potential game, with 
the same potential function as $(c£, a) in and the weight w n = — log(l — p n ). 

Proof: Suppose that a user k changes its current location d k and channel a k to a location d' k and 
channel a' k , and the system state changes from (d, a) to (d , a') accordingly. Then the change in the 
potential function $ is given as 

a) - a) =$(d', a') - a) + a) - a) 

= - log(l - p fc ) a) - U k {d\a)) - log(l - p fc ) a) - U k (d, a)) 

= - log(l - p k ) (U k (d\ a) - U k (d, a)) , 

which completes the proof. ■ 
Lemma [7] implies the following key result. 

Theorem 5. The joint spatial channel selection and mobility game has a Nash equilibrium and the finite 
improvement property. 

To reach a Nash equilibrium of the joint spatial channel selection and mobility game, we can run the 
distributed learning mechanism for channel selection and distributed strategic mobility algorithm together. 
According to the numerical results, the distributed learning mechanism can converge to a Nash equilibrium 
in less than one minute (<300 x 100 time slots, and each time slot is assumed to be 2 milliseconds, which 
is longer than one normal time-slot in the standard GSM system). Thus, we can implement the distributed 
strategic mobility algorithm at a larger time-scale (say every few minutes), and implement the distributed 
learning for channel selection at a smaller time scale (say every few milliseconds). Under such separation 
of time scales, it is reasonable to assume that the distributed learning mechanism operating at the small 
time scale achieves convergence between two updates at the large time scale. We show that 

Theorem 6. With the separation of time-scales, the joint distributed learning mechanism and strategic 
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mobility algorithm converges to a Nash equilibrium of the joint spatial channel selection and mobility 
game as the temperature 7—7-00. 

The proof is given in Appendix |Gl The key idea of the proof is that the distributed learning mechanism 
globally maximizes the potential function a) in decision variable a given the fixed location profile 
d, i.e., max a $(<i, a). Then the strategic mobility algorithm at the larger timescale also maximizes the 
potential function $(<Z, a) in terms of decision variable d given that the channel selections are a* d = 
argmax a a). That is, the algorithm will converge to the equilibrium such that the best location 
profile d* with the maximum potential a* d *) will be selected. And a maximum point to the potential 
function is also a Nash equilibrium of the potential game ||25l . 

VI. Uniqueness and Efficiency of Nash Equilibrium 

In previous sections, we have considered the existence of Nash equilibrium and proposed distributed 
algorithms for achieving the equilibrium. We will further explore the uniqueness and efficiency of the 
Nash equilibrium, which can offer more useful insights for the game theoretic approach for distributed 
spectrum sharing with spatial reuse. 

A. Uniqueness of Nash equilibrium 

Due to the combinatorial nature of joint channel and location selections, the Nash equilibrium of the 
game is not unique in general. For example, we consider a game with two users M = {1, 2}, two channels 
Ai = {1,2}, and two locations A = {d l ,d 2 }. Two locations are close such that Hd 1 ,^!! < 5, and both 
users and channels are homogeneous such that 6 m = 8, B^ dn = B,p n = p. In this case, there are eight 
Nash equilibria ((di, d 2 ), (a u a 2 )) for the game, i.e., ({d 1 , d 1 ) , (1, 2)), ((d 2 , d 2 ) , (1, 2)), ((d 1 , d 1 ) , (2, 1)), 
((d 2 , d 2 ) , (2, \)),{{d\ d 2 ) , (1,2)), ((d\ d 2 ) , (2, 1)), ((d 2 , d 1 ) , (1,2)), and ((d 2 , d l ) , (2, 1)). 

In general, selecting from multiple Nash equilibria is quite hard, and the proposed algorithm is guar- 
anteed to converge to one of the Nash equilibria. 

B. Price of Anarchy 

Since the Nash equilibrium is typically not unique, we then study the efficiency of Nash equilibria. 
Following the definition of price of anarchy (PoA) in game theory [7], we will quantify the efficiency 
ratio of the worst-case Nash equilibrium over the centralized optimal solution. We first consider the spatial 



21 

channel selection game with a fixed spectrum access location profile d. Let H be the set of Nash equilibria 
of the game. Then the PoA is defined as 

PoA _ mil Ws EneN U n( d > a ) 

max ae _ M iv E„eAf U n (d, a) ' 

which is always not greater than 1. A larger PoA implies that the set of Nash equilibrium is more efficient 
(in the worst-case sense) using the centralized optimum as a benchmark. Let w = max ne ^{— log(l— p n )}, 
E(d) = min neA r max meA1 {log (9 m B^ dn p n ) }, and K(d) = max n6 ^{|jV„(d)|}. We can show that 

Theorem 7. For the spatial channel selection game with a fixed spectrum access location profile d, the 
PoA is no less than 1 — K if\^ . 

E(d) 

The proof is given in Appendix iHl Intuitively, when users are less aggressive in channel contention (i.e., 
w is smaller) and users are more homogeneous in term of channel utilization (i.e., E(d) is larger), the 
worst-case Nash equilibrium is closer to the centralized optimum and hence the PoA is larger. Moreover, 
Theorem [7] implies that we can increase the efficiency of spectrum sharing by better utilizing the gain of 
spatial reuse (i.e., reducing the interference edges K(d) on the interference graph). Similarly, by defining 
that 7] = max dee {^p|}, we see that the PoA of the joint spatial channel selection and mobility game is 
no less than 1 — rjzu. 

The PoA characterizes the worst-case performance of Nash equilibria. Numerical results in Section IVIII 
demonstrate that the convergent Nash equilibrium of the proposed algorithm is often more efficient and 
has a less than 8% performance loss, compared with the centralized optimal solution. 

VII. Numerical Results 

We now evaluate the proposed algorithms by simulations. We consider a Rayleigh fading channel 
environment. The data rate of secondary user n on an idle channel m at location d is given as d = hjf^. 
Here hd is a location dependent parameter. Parameter 6^ is the data rate computed according to the 
Shannon capacity, i.e., &™ = 5 m log 2 (l + ^r 22 -), where B m is the bandwidth of channel m, ( n is the 

m,d 

power adopted by user n, w,™ d is the noise power, and g™ n is the channel gain (a realization of a random 
variable that follows the exponential distribution with the mean g^). In the following simulations, we set 
B m = 10 MHz, bJ^ nd = —100 dBm, and ( n = 100 mW. By choosing different location parameter hd 
and mean channel gain (?™ , we have different mean data rates = B r ^ nd = hdElb^] = hdB^ for 

different channels, locations, and users. For simplicity, we set the channel availabilities 6 m = 0.5. 
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Fig. 5. Interference graphs 

A. Distributed Learning For Spatial Channel Selection 

We first evaluate the distributed learning algorithm for channel selection with fixed user locations. For 
the distributed learning algorithm initialization, we set the length of each decision period K = 100, which 
can achieve a good estimation of the expected payoff. For the smoothing factor fix, a higher value can 
lead to a faster convergence. We hence set /j,t = ^, which has the fastest convergence while satisfying 
the convergence condition in Theorem |2l 

Since locations are fixed, we set the location parameter h d = 1. We consider a network of M = 5 
channels and N = 9 users with four different interference graphs (see Figure [5]). Graphs (a), (b) and (c) 
are the commonly-used regular interference graphs, and Graph (d) is a randomly-generated non-regular 
interference graph. Let B n = {By, B^) be the mean data rate vector of user n on M channels. We 
set B x = B 2 = B 3 = (0.1,0.3,0.8,1.0,1.5) Mbps, S 4 = B 5 = B 6 = (0.2,0.6,1.6,2.0,3.0) Mbps, and 
B 7 = B 8 = B$ = (0.5, 1.5, 4.0, 5.0, 7.5) Mbps. The fixed channel contention probabilities p n of the users 
are randomly assigned from the set {0.1,0.2, ...,0.9}. 

Let us first look at the convergence dynamics, using graph (d) in Figure \5\ as an example. Figure [6] 
shows the learning dynamics of user 4 in terms of the channel selection probabilities on 5 channels. It 
demonstrates the convergence of the distributed learning algorithm. Figure [7] shows the learning dynamics 
of the potential function value $. We see that the distributed learning algorithm can lead the potential 
function of the spatial channel selection game to the maximum point, which is a Nash equilibrium 
according to the property of potential game. 

To benchmark the performance of the distributed learning algorithm, we compare it with the solution 
obtained by the centralized global optimization of max a Yln&M Un(d, a) on all the interference graphs. 
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Fig. 6. Learning dynamics of user 4's channel selection probabilities 
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Fig. 7. Learning dynamics of potential function value 



The results are shown in Figure [8] We see that the performance loss of the distributed learning is less 
than 5% in all cases. 

We look at another network with = 50 users randomly scattered across a square area of a side- 
length of 250m (see Figure©. We set users' transmission range 5 = 20, 40, 60, 80, and 100m, respectively. 
Figure \\0\ shows the performance comparison between distributed learning and the centralized optimiza- 
tion solution. As the transmission range 5 increases, the performances of both distributed learning and 
centralized global optimization solutions decrease. In all cases, the performance loss of the distributed 
learning algorithm is less than 8%, compared with the centralized global optimization solution. This shows 
the efficiency of distributed learning algorithm. 
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Fig. 8. Comparison of distributed learning and global optimization 
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Fig. 9. A square area of a length of 250m with 50 scattered users with an transmission range 8 — 60m. Each user is represented by a dot 
and two users interfere with each other if they are connected by an edge. 



B. Joint Distributed Learning and Strategic Mobility 

We next study the joint distributed learning and strategic mobility algorithm. We consider a location 
map as shown in graph (a) of Figure [TT] Black cells are obstacles, and no users can move there. Each 
user in a cell can interfere with those users within the same cell and the ones in neighboring cells (along 
the line and diagonal). Each user initially locates in the same cell in the bottom left corner, and is allowed 
to move to the neighboring cells once it gets the chance to update its location. Each cell is randomly 
assigned with a location parameter h d from the set {0.5, 1.0, 2.0}, and each user has different mean data 
rates as specified in Section IVII-AI 

We implement the joint algorithm with the temperature 7 = 10, 20, and 50, respectively. The location 
update process follows the exponential distribution with a mean of 10. We show in Figure QT] users' 
locations and channel selections at the iteration step t = 2, 50, and 100, respectively (with the temperature 
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7 = 50). We observe that users try to spread out in terms of physical locations and meanwhile choose 
channels with higher data rates, in order to maximize their payoffs. From Figure [121 we see that the 
performance of the algorithm improves as the temperature 7 increases, and the convergence time also 
increases accordingly. When 7 = 50, the performance loss of the joint algorithm is less than 6%, compared 
with the global optimal solution, i.e., max dja ^ ngA f U n (d, a). This shows the efficiency of the Nash 
equilibrium. When users are static (without strategic mobility) and close-by, the performance loss of the 
distributed learning for channel selection can be as high as 18%, which justifies the motivations for the 
strategic mobility design. 

We further implement simulations where the temperature 7 = 50 and the location update process follows 
the uniform distribution and power law distribution with the same mean as in the exponential distribution 
case, respectively. The results in Figure [T3] verify that the convergent system performance is the same 
as long as the location update process follows a general probability distribution with the same mean as 
in the exponential distribution case. Moreover, we observe that the convergence time increases when the 
distribution has a longer tail (e.g., power law distribution). This is because that a small fraction of users 
would have a longer waiting time for the location update when a long-tailed distribution is implemented. 

VIII. Conclusion 

In this paper, we generalize the spatial congestion game framework for distributed spectrum access 
mechanism design with spatial reuse. We consider both the spatial channel selection game and the 
joint spatial channel selection and mobility game, and propose distributed algorithms using users' local 
information that converge to the Nash equilibria for both games. Numerical results verify that Nash 
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Fig. 11. Dynamics of users' locations and channel selections with the temperature 7 = 50 
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Fig. 12. Dynamics of time average system utility with location update process following the exponential distribution and the temperature 
7 = 10, 20, and 50, respectively 

equilibria are quite efficient and have less than 8% performance loss, compared with the centralized 
optimal solutions. 

For the future work, we are going to investigate the distributed spectrum sharing mechanism design 
with spatial reuse that can achieve the centralized optimal solution. 

Appendix 

A. Proof of Lemma [7] 

For the ease of exposition, we first define pi = log(l — pi), C m d~ ^°s(^mB l m d Pi), and 



j£Af™(d,a) 
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Fig. 13. Dynamics of time average system utility with the location update process following different distributions and the temperature 

7 = 50 



Thus, we have $(d, a) = ^Ix El=i *?(d, a). 

Now suppose that a user k unilaterally changes its strategy a k to a k . Let a' = (ai, a k _i, a' k , a^+i, a^) 
be the new strategy profile. Thus, the change in potential $ from a to a is given by 

N M N M 

*(d, a) - $(d, «) = E E <W a') - E E W a ) 

i=l m=l i=l m=l 

M M M M 

= J2®T(d,a')-J2n i (d,a)+ £ £ a) - £ E <W a) 

m=l m=l ie-Vjfc(d) m=l i&A/j (d) m=l 

E^(d,a')-EW«))+ E fC ; (d,a')-C' fc (d,a))+ £ ($?*(d, a') - *J*(d, a) 



m=l 



(21) 



Equation (l2~TT) consists of three parts. Next we analyze each part separately. For the first part, we have 



M 



M 



J2^(d,a')-J2^(d,a) 



m=l 



m=l 



(d, a = -P* 



( \ 

2 E Pi + ^,dk 



+ Pk 



(2 E Pi + £a k ,d k J 



(22) 



For the second part in (12lJ) . 



$f fe (d, a') -$f fc (d, a) 
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\ E Pi + &A 



\ a i= a k! 



+ Pi 



( \ 

\Pi E p*- E Pi 

/ / 

\jGAA(d,a') je/vf fc (d,a) / 



5 E Pi + C; ; 

1 



7 



r ' l 

{a;=a fc j 



l r ' l 

{dj=a. } 



{oi=a, }' 



This means 



^ ( $?*(d, a) - $"*(d, «) ) = E -^PiPfe J {a,=a'j = E # 



iGAr fc (d) v 7 i6A/" fe (d) 

For the third term in (12TI) . we can similarly get 



idM k {d) 



i&N k k (d,a) 



Substituting d22b . (123b . and (EH) into d2B, we obtain 



(23) 



(24) 



-p fc (tf fc (d,o)- (25) 



$(d,a) - $(d, a 

= -p* E ^+0 feA " E 

Since < p k < 1 and hence — log(l — > 0, we can conclude that $(d, a) defining in © is a 
weighted potential function with the weight — log(l — Pk)- □ 



B. Proof of Lemma \3\ 

We complete the proof by checking the assumptions of Theorem 2.1 in ||26]|(pp.l27). 

(a) Since < 9 m ,p n < 1 and b^ d is bounded, then U n (T) must be also bounded. It follows that 

\U n (T)(I {an{T)=m} - <(T))| < oo. Thus, snp T E[\U n (T)(I {an{T)=m} - <(T))| 2 ] < oo. 

(b) First, we can obtain from (fTTI) that 



= tm. = ^(T)(J {anC 0= m }-ff m (T)). 



(26) 
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By taking the expectation of the RHS of (|26l) with respective to <x(T), we have 

E[U n (T)(I {an(T)=m} - al(T))\a(T)} =<(T)(1 - <(T))K;(<t(T)) + E^™ 1 " <( T MV0n) 



<(T)]T<(T)(^(<t(T)) - V?(tr(T)))- 



i=l 



(c) First, V^(tr(T)) = E[U n (T)\cr(T), a n (T) = m] is an expectation function, and hence is differen- 
tiable. It then follows that cr™ (T) J^fLi °i ( T )(^m(°"( T )) ~ ^ n ( CT ( T ))) is also differentiable because the 
sum of differentiable functions is also differentiable. Thus cx™ (T) v?(T){V£((t{T)) - ^ n (<r(T))) is 
continuous. 

(d) We have J^t I^t = oo and J2t < oo by assumption. 

(e) Since the sample average estimation is unbiased, the noise term is hence the martingale difference 
noise. Then the expected biased error f3 T = 0. It follows that J2t ^t\Pt\ < oo with probability one. □ 

C. Proof of Lemma 

Let a = (i, a_ n ) and a = (j, a_ n ). By the definition of V r [ l (cr(T)), we first have that 



-E 



6 an B n an4n p n J] (1 - p„0 ) |a n = z, o-(T) 

n'£Af% n (d,a n ,a_„) 



7i'eA/£ n (d,a»,a_„) 



logj^s^ n (i- Pn , 

n'eA/l(d,a) 



log \e 3 Bi dnPn J] (1-ft.o) k-n(r) 



n'eJ\fl(d,a) 



logO.Bl^pn 11 (1-jv) ] Pr{a_ re |<T_ n (T)} (27) 

n'eMi(d,a') 



According to (1251) . we have 



log 



n a 

n'GA/A(d,a) 



log 



n 



30 



- iog(i -Pk)[\ E 1 °s( 1 - + l °s^ k B k ak>dkPk 

k=l \ n'eAf k k (d,a) , 



log(l -p„ 

£ - log(l - p fc ) ( i E log(l - p n ,) + log 6 ak B k ak4k p k \ . (28) 

\ n'&M a k k {d,a) 



-log(l-p rv 



By d27j) and (EHJ), it follows that 

-log(l- Pn ) (V?(<r{T))-V?{a(T))) 



= ^Pr{a_ B k_ n (r)}(^-bg(l-p h ) £ log(l-^) + log^< A p fc 

-£-log(l-p fc ) (I ^ log(l - jv) + log # afe £ a \ A p fe 

fc=1 \ n'G^ afc (d,a') , 

=AVCO)-^(<r(T)), 

which completes the proof. □ 



D. Proof of Theorem [2] 

We first consider the variation of L(cr(T)) along the trajectories of ODE in (fT2~l) . i.e., differentiating 
L(tr(T)) with respective to time T, 

dL(a(T)) _ A dL{a{T)) daf(T) 
dT da?(T) dT 

M Af 

= E WW E <( T ) Wvco) - kvco)) 
i=i i=i 

A/ M 

= 2 E E °i WCO i V j n « T )) - V?(<r(T))) (2£(<r(T)) - m*{T))) . (29) 

j=i t=i 

According to Lemma |U we have dL ^ T ^ > 0. Hence L(cr(T)) is non-decreasing along the trajectories 
of the ODE (PT21) . According to (371, the learning mechanism converges to a stationary point cr* such that 
^fl = 0, i.e., Qfi t jeM,nefS) 



afa f l*{y; i {(T*) - Vl\(T*))(L r ;(<T*) - U>(tT*)) = cxf - V?{a*)) 2 = 0. (30) 

da" 



According to CE2]> and (|30l . we have = 0, Vm E M,n E M. 
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If cr* is a Nash equilibrium, it must satisfy that 

M 

V?{**) <^<^V),Vn6^«G M. (31) 
i=i 

If cr* is not a Nash equilibrium, we must have that there is some n and % such that V^{cr*) > J2jLi cr'j*V J n (cr*] 
Due to the continuity of the expectation function V™, the inequality will still hold in a small open 
neighborhood around cr*. Then it follows from (fT2l) that, for all points cr in this neighborhood of cr* that 
satisfy a™ 7^ 0, we have 

da n - 

^ = TO)-ftW)> - 

Hence in all sufficiently small neighborhoods of cr*, there will be infinitely many points starting from which 
cr will eventually leave the neighborhood. Thus, the learning mechanism must asymptotically converge 
to a stable stationary point cr* that satisfies (13TI) . which is a Nash equilibrium. Moreover, according to 
the Sard's theorem [|38l . when the ODE (fT2~l) asymptotically converges, the converging equilibrium is not 
contained in the interior of the mixed strategy polytope 11391 . That is, for each user n, there exists only one 
channel selection a* E M. such that cr,™* = 1 if m — a* and cr,™* = otherwise. The learning mechanism 
hence converges to a Nash equilibrium with pure strategy profile. □ 

E. Proof of Lemma \5\ 

Suppose that a user k changes its location d k to the location d' k . Let d = (di, .., dk-i, d' k , dk+i, (In). 
Recall that p { = log(l — pi) and Q nd = \og(9 m B l m d pi) as defined in Appendix lAl Then the change in 
potential $ from d to d is given by 



$(d>)-f(d,a) = - P 4^-4J + X;-U E Pi- E m 

i=1 \je^(d',a) jeA^*(d,a) / 



(32) 



For the last term, we have 



\pk E ^ _ E ft I - 2 S J {nG^(d',a)U^ fc ( C i,a)}P« E ^ " E ^ 

V£< fc (d'.a) j£< fe (d,a) / n£Af \j€^ n (d',a) jeNZ n (d,a) 
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- \pA E Pi- E ^ - 5 E J {« 

\jG< fc (d',a) j€< fc (d,a) / *€Af 

-\pk\ E ^ - E pj)- \p k E 

+ oP* E ^{ngA^(d',a)UA^(d,a)}^{nGK fc (<i,a)}P''i 



&N k (d' ,a)UAf k (d,a)}P"-Pk \^{neM^ k {d',a)} ^{neA^ (d,a)} 



{ndM k (d' ,a)\jM k {d,a)}hndMl k (d' ,a)}Pn 



2 

neJV 

= - Pk I E ^ ~ E p j ■ (33) 

\jG< fe (d',a) jG< fc (d,a) / 

Combing (f32]) and (l3~3l . we have <b{d ,a) - $(d, a) = - log(l - p fc ) (u k (d',a) - U k (d, a) J . □ 
i*] Proof of Lemma |6| 

As mentioned, the system state of the distributed strategic mobility Markov chain is defined as the 
location profile d G of all users. Since distance measure is symmetry, we have that if d G A d then 
d G A™, . Further, since all locations on the spatial domain A are connected, all system states d hence can 
reach each other within a finite number of transitions, and the resulting finite Markov chain is irreducible 
and aperiodic. The process is thus ergodic and has a unique stationary distribution. 

We then show the Markov chain is time reversible by checking the following detailed balance equations 
are satisfied: 

Pr(d,a)q d>d , = Pr(d' , a)q d , d ,Vd, d' G 6, (34) 

where q dd > is the transition rate from state d = (d±, ...,djv) to state d = (e^, ...,d' N ). According to the 
algorithm, we know that the set of states that is directed connected to the state d are the one where d 
and d differ by exactly one user, say user n, such that di = d' { ,Wi ^ n and d n ^ d' n . 

Since user n revises its location by the timer mechanism, according the system state transition rate in 
( TT6b . we have that 

e 7*(d,a) g-log(l-p n )7f7„(d ,o) 

Pr(d,a)q d d > = r n ^— ; — . (35) 

V ina,a Y^ d d N eT*( d >°) e-loga-pnbf™ (d,a) + g-log(l-f>„) 7 [/„(d', a) 

Similarly, we obtain that 

g7<E>(d ,a) g— log(l— p„)7!7n(d,o) 

Pr(d,a)q dd > = r n ; — . (36) 

J2ded N eT*( d '°) e -ios(i- P nhUn{d,a) + e -iog(i-p„) 7 £/„(d ,o) 
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Since the strategic mobility game is a potential game, we have 

$(d', a) - $(eZ, a) = - log(l - p re ) (u n {d\ a) - £/ n (eZ, a)) . (37) 

Combing (1331) . (1361) and (1371) . we have detailed balance equation (13~4l) hold. The Markov chain is hence 
time-reversible and has the stationary distribution given in (fTTT) . □ 

G. Proof of Theorem [6| 

According to Theorem [2l we know the distributed learning algorithm can converge to the Nash equi- 
librium of the spatial channel selection game. Let a* d be the Nash equilibrium by the distributed learning 
algorithm when the location profile of all users are d. Since the Nash equilibrium of the potential game 
is also a maximum point to the potential function, we have a* d = argmax a $(eZ, a). 

Similarly as the analysis of the distributed strategic mobility algorithm in Lemma [6l we define the 
system state of the joint channel selection and strategic mobility Markov chain as the location profile 
d G of all users. Since distance measure is symmetry, we have that if d' G then d G Ay . Further, 
since all locations on the spatial domain A are connected, all system states d hence can reach each other 
within a finite number of transitions, and the resulting finite Markov chain is irreducible and aperiodic. 

We then show the Markov chain is time reversible by checking the following detailed balance equations 
are satisfied: 

Pr(d)q did , = Pr(d')q d , d yd, d' G 0, (38) 

where q dd / is the transition rate from state d = (d±, ...,djv) to state d — (d'i, ■■■,d' N ). According to the 
algorithm, we know that the set of states that is directed connected to the state d are the one where d and 
d differ by exactly one user, say user n, such that di = d[\/i ^ n and d n ^ d' n . Since the user n revise its 
location by the timer mechanism, we know that the rate of revision is equal to r n |Aj|. Since user n will 
randomly choose a new location d n and stays there with probability 



, log(l- Pn ) 7 C/„(d,a*) +e - 1 °g( 1 -P™)T' 7 ™( d '.' 1 */)' 
, - log(l-pn)7C?i(d . a *l) 

the probability from state d to d is then given as , 1——- Thus ,the 

transition rate from state d to d is give as 

-log(l-p„)7C7„(d',o* ; ) 
6 d 

Qdd' — T n / • (39) 

e -\og(l-pn)yUn(d,a* d ) _|_ e -log(l-p«)7^n(d ,«*,) 
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It follows that 



e 7$(d,a*) -log(l-p„)7tA n (d ,o*,) 



Pr(d)q ddl = r n -= — ; . (40) 

Eded^ e 7 *^'^ e -iog(i- P n)7C/n(rf,a*) + e -tog(i-Pn)7^»(d 

Similarly, we obtain that 

7*(d ,<**,) -log(l-p„)7?7„ (d,o*,) 

, /. 6 <2 6 d 

Pr(d)a > =t (41) 

"Eded- e 7$(d ' a P e -log(l- P „)7^(d,a*) + e -log(l- P „)7C/n(d',a;,) ' 

Since the joint channel selection and strategic mobility game is a potential game, we have 

Hd',a* d ,) - $(d,a* d ) = -log(l -p n ) (U n (d',a* d ,) - U n (d,a* d )j . (42) 



Combing (1401) . (|4Tj) and (142J) . we have detailed balance equation (1381) holds. The Markov chain is hence 

7*(<*,a.j) 

time-reversible and has the unique stationary distribution given as Pr[d) = — - a * } ,d G 6. 

With the similar proof as in Theorem @1 we can hence show that, as 7 — > 00, the algorithm approaches the 
equilibrium such that $(<Z, a d ) is maximized in term of decision variable d, i.e., maxd $(<i, a* d ). Further- 
more, we can show that max d $(c£, a d ) = max dia a) by contradiction. Let d* = argmax^ <E>(eZ, a d ) 
and (d, a) = argmax rf a $(d, a). Suppose that $(d*,a d „) < $(d, a). Since the learning algorithm 
maximizes the potential $(d, a) given a location profile d, we have that $(d, a.) = max a $(d, a) > 
$(d,a). It follows that $(d,a d ) > $(d, a) > $(d*,a**), which contradicts with that $(d*,a*,) = 
max d $(d, a d ) > $(d, a.). Since the joint spatial channel selection and mobility game is a potential game, 
we know that the maximum point $(d*, a d *) of the potential function must be a Nash equilibrium. □ 

H. Proof of Theorem [7| 

For the ease of exposition, we first define that E n (d) = max mg ^ {log (^m^^Pn)} and hence we 
have E(d) = mm neJ ^ {E n (d)} . Since < p n < 1, we have log(l — p n ) < 0. It follows from © that 

U n {d,a)<E n (d). (43) 

Thus, 

max ^ U n (d, a)<Y,E n (d) . (44) 
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Suppose that a G E is an arbitrary Nash equilibrium of the spatial channel selection game. Then at Nash 
equilibrium, we must have that 

U n (d,a)>E n (d)+ Yl Mi-Pi)- ( 45 ) 

Otherwise, the user n always can improve its payoff by choosing the channel that maximizes log (# m .B^ d n Pn) • 
According to (|44l) and (1451) . we then obtain 

EngA^( d > a ) 



PoA > 



> 



maX aEneAT C/ n( d ' a ) 

EngA^ (^n(d) + EteJV„(d) lo §(l - Pi 

1 , Eng^EigA^Ml "Pi) 



> 1 

> 1 



E„ 6 AA £n(<f) 

EngAf EigA/^d) CT 

E„gAr^n(d) 
E„g^^(^)^ 



NE(d) 



□ 
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